Skip to main content

GROUP BY

Use group by to aggregate data by a specific value(s). When group by is included in the query, the output dataset includes just the last records aggregated by the specified value. In the below examples, we use the aggregate function count{}() to get a number of records for each aggregated group for demonstration purposes.

tip

QQL supports group by by BOOLEAN, CHAR, DATETIME, TIMEOFDAY, INTEGER, VARCHAR, and ENUM.

For example, let's group by by a specific object type:

Examples with GROUP BY keyword
-- Output dataset includes the last records aggregated by symbol.
SELECT * FROM binance GROUP BY symbol

-- aggregates the output by the entry.level
-- count{}() aggregate function provides a records count for each aggregated group
SELECT entry.*, entry.level, count{}()
FROM binance
ARRAY JOIN entries AS entry
GROUP BY entry.level

You can group by by more than one value:

-- Output dataset includes the last records aggregated by entry.level and symbol, count() aggregate function provides a records count for each aggregated group.
SELECT entry.*, entry.level, count{}()
FROM binance
ARRAY JOIN entries AS entry
GROUP BY entry.level, symbol

group by does not have to be performed by a specific column name, like in previous examples. You can group by a calculated value:

SELECT THIS.*, avg{}(totalQuantity) FROM infoA GROUP BY (infoIdA % 3)

There may be a case, depending on each server configurations, when there is not enough server memory to process all the created groups at once, when their count is very large. In this case, groups are cached on disk, which may cause a significant query performance downgrade. There is a TimeBase server java vm system property called maxGroupsCount which is set to one million by default: -DTimeBase.qql.maxGroupsCount=1000000. When the groups count reaches the maxGroupsCount value groups start to get cached on disk. On top of that, the system performs a server memory check before the caching procedure each time the groups count overpasses maxGroupsCount value by 5%. To release the server memory, query may be performed in several steps, processing maxGroupsCount in each step, which may cause delays in query processing. Please keep it in mind when creating your queries.

Group by Timestamp

Group by timestamp is not an efficient operation as it produces a large number of groups. Use over time (1ms) when you need to group by timestamp:

-- Sum total traded size for each timestamp
select sum{}(entry.size) from binance
array join (entries as array(TradeEntry)) as entry
over time (1ms)
where entry is not null